NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

Xiong, Guojun; Wang, Shufan; Jiang, Daniel; Li, Jian (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Federated reinforcement learning (FedRL) enables multiple agents to collaboratively learn a policy without needing to share the local trajectories collected during agent-environment interactions. However, in practice, the environments faced by different agents are often heterogeneous, but since existing FedRL algorithms learn a single policy across all agents, this may lead to poor performance. In this paper, we introduce a \emph{personalized} FedRL framework (PFedRL) by taking advantage of possibly shared common structure among agents in heterogeneous environments. Specifically, we develop a class of PFedRL algorithms named PFedRL-Rep that learns (1) a shared feature representation collaboratively among all agents, and (2) an agent-specific weight vector personalized to its local environment. We analyze the convergence of PFedTD-Rep, a particular instance of the framework with temporal difference (TD) learning and linear representations. To the best of our knowledge, we are the first to prove a linear convergence speedup with respect to the number of agents in the PFedRL setting. To achieve this, we show that PFedTD-Rep is an example of federated two-timescale stochastic approximation with Markovian noise. Experimental results demonstrate that PFedTD-Rep, along with an extension to the control setting based on deep Q-networks (DQN), not only improve learning in heterogeneous settings, but also provide better generalization to new environments.
more » « less
Free, publicly-accessible full text available April 24, 2026
Transient Stability Enhancement via a Scalable RL Method with VSG Parameter Tuning

https://doi.org/10.1109/IECON55916.2024.10905298

Huang, Xiaoge; Zhang, Ziang; Wang, Shufan; Li, Jian (November 2024, IEEE)

Full Text Available
Whittle Index-Based Q-Learning for Wireless Edge Caching With Linear Function Approximation

https://doi.org/10.1109/TNET.2024.3417351

Xiong, Guojun; Wang, Shufan; Li, Jian; Singh, Rahul (October 2024, IEEE/ACM Transactions on Networking)

Full Text Available
Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks

https://doi.org/10.1109/TWC.2024.3416437

Wang, Shufan; Xiong, Guojun; Zhang, Shichen; Zeng, Huacheng; Li, Jian; Panwar, Shivendra S (October 2024, IEEE Transactions on Wireless Communications)

Full Text Available
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints

https://doi.org/10.1609/aaai.v38i14.29489

Wang, Shufan; Xiong, Guojun; Li, Jian (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Restless multi-armed bandits (RMAB) have been widely used to model sequential decision making problems with constraints. The decision maker (DM) aims to maximize the expected total reward over an infinite horizon under an “instantaneous activation constraint” that at most B arms can be activated at any decision epoch, where the state of each arm evolves stochastically according to a Markov decision process (MDP). However, this basic model fails to provide any fairness guarantee among arms. In this paper, we introduce RMAB-F, a new RMAB model with “long-term fairness constraints”, where the objective now is to maximize the longterm reward while a minimum long-term activation fraction for each arm must be satisfied. For the online RMAB-F setting (i.e., the underlying MDPs associated with each arm are unknown to the DM), we develop a novel reinforcement learning (RL) algorithm named Fair-UCRL. We prove that Fair-UCRL ensures probabilistic sublinear bounds on both the reward regret and the fairness violation regret. Compared with off-the-shelf RL methods, our Fair-UCRL is much more computationally efficient since it contains a novel exploitation that leverages a low-complexity index policy for making decisions. Experimental results further demonstrate the effectiveness of our Fair-UCRL.
more » « less
Full Text Available
Perceived experts are prevalent and influential within an antivaccine community on Twitter

https://doi.org/10.1093/pnasnexus/pgae007

Harris, Mallory J.; Murtfeldt, Ryan; Wang, Shufan; Mordecai, Erin A.; West, Jevin D.; Ognyanova, ed., Katherine (February 2024, PNAS Nexus)

Abstract Perceived experts (i.e. medical professionals and biomedical scientists) are trusted sources of medical information who are especially effective at encouraging vaccine uptake. The role of perceived experts acting as potential antivaccine influencers has not been characterized systematically. We describe the prevalence and importance of antivaccine perceived experts by constructing a coengagement network of 7,720 accounts based on a Twitter data set containing over 4.2 million posts from April 2021. The coengagement network primarily broke into two large communities that differed in their stance toward COVID-19 vaccines, and misinformation was predominantly shared by the antivaccine community. Perceived experts had a sizable presence across the coengagement network, including within the antivaccine community where they were 9.8% of individual, English-language users. Perceived experts within the antivaccine community shared low-quality (misinformation) sources at similar rates and academic sources at higher rates compared to perceived nonexperts in that community. Perceived experts occupied important network positions as central antivaccine users and bridges between the antivaccine and provaccine communities. Using propensity score matching, we found that perceived expertise brought an influence boost, as perceived experts were significantly more likely to receive likes and retweets in both the antivaccine and provaccine communities. There was no significant difference in the magnitude of the influence boost for perceived experts between the two communities. Social media platforms, scientific communications, and biomedical organizations may focus on more systemic interventions to reduce the impact of perceived experts in spreading antivaccine misinformation.
more » « less
Reinforcement Learning for Dynamic Dimensioning of Cloud Caches: A Restless Bandit Approach

https://doi.org/10.1109/TNET.2023.3235480

Xiong, Guojun; Wang, Shufan; Yan, Gang; Li, Jian (January 2023, IEEE/ACM Transactions on Networking)

We study the dynamic cache dimensioning problem, where the objective is to decide how much storage to place in the cache to minimize the total costs with respect to the storage and content delivery latency. We formulate this problem as a Markov decision process, which turns out to be a restless multi-armed bandit problem and is provably hard to solve. For given dimensioning decisions, it is possible to develop solutions based on the celebrated Whittle index policy. However, Whittle index policy has not been studied for dynamic cache dimensioning, mainly because cache dimensioning needs to be repeatedly solved and jointly optimized with content caching. To overcome this difficulty, we propose a low-complexity fluid Whittle index policy, which jointly determines dimensioning and content caching. We show that this policy is asymptotically optimal. We further develop a lightweight reinforcement learning augmented algorithm dubbed fW-UCB when the content request and delivery rates are unavailable. fW-UCB is shown to achieve a sub-linear regret as it fully exploits the structure of the near-optimal fluid Whittle index policy and hence can be easily implemented. Extensive simulations using real traces support our theoretical results.
more » « less
Full Text Available
Reinforcement Learning for Dynamic Dimensioning of Cloud Caches: A Restless Bandit Approach

Xiong, Guojun; Wang, Shufan; Yan, Gang; Li, Jian (May 2022, IEEE International Conference on Computer Communications (IEEE INFOCOM))

We study the dynamic cache dimensioning problem, where the objective is to decide how much storage to place in the cache to minimize the total costs with respect to the storage and content delivery latency. We formulate this problem as a Markov decision process, which turns out to be a restless multi-armed bandit problem and is provably hard to solve. For given dimensioning decisions, it is possible to develop solutions based on the celebrated Whittle index policy. However, Whittle index policy has not been studied for dynamic cache dimensioning, mainly because cache dimensioning needs to be repeatedly solved and jointly optimized with content caching. To overcome this difficulty, we propose a low-complexity fluid Whittle index policy, which jointly determines dimensioning and content caching. We show that this policy is asymptotically optimal. We further develop a lightweight reinforcement learning augmented algorithm dubbed fW-UCB when the content request and delivery rates are unavailable. fW-UCB is shown to achieve a sub-linear regret as it fully exploits the structure of the near-optimal fluid Whittle index policy and hence can be easily implemented. Extensive simulations using real traces support our theoretical results.
more » « less
Full Text Available
Parametric Bootstrap for Differentially Private Confidence Intervals

Ferrando, Cecilia; Wang, Shufan; Sheldon, Daniel (January 2022, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AITSTAS))

The goal of this paper is to develop a practical and general-purpose approach to construct confidence intervals for differentially private parametric estimation. We find that the parametric bootstrap is a simple and effective solution. It cleanly reasons about variability of both the data sample and the randomized privacy mechanism and applies "out of the box" to a wide class of private estimation routines. It can also help correct bias caused by clipping data to limit sensitivity. We prove that the parametric bootstrap gives consistent confidence intervals in two broadly relevant settings, including a novel adaptation to linear regression that avoids accessing the covariate data multiple times. We demonstrate its effectiveness for a variety of estimators, and find empirically that it provides confidence intervals with good coverage even at modest sample sizes and performs better than alternative approaches.
more » « less
Full Text Available

Search for: All records